Bilingual Machine-Aided Indexing
نویسندگان
چکیده
Abstract The proliferation of multilingual documentation in our Information Society has become a common phenomenon. This documentation is usually categorised by hand, entailing a time-consuming and arduous burden. This is particularly true in the case of keyword assignment, in which a list of keywords (descriptors) from a controlled vocabulary (thesaurus) is assigned to a document. A possible solution to alleviate this problem comes from the hand of the so-called Machine-Aided Indexing (MAI) systems. These systems work in cooperation with professional indexer by providing a initial list of descriptors from which those most appropiated will be selected. This way of proceeding increases the productivity and eases the task of indexers. In this paper, we propose a statistical text classification framework for bilingual documentation, from which we derive two novel bilingual classifiers based on the naive combination of monolingual classifiers. We report preliminary results on the multilingual corpus Acquis Communautaire (AC) that demonstrate the suitability of the proposed classifiers as the backend of a fully-working MAI system.
منابع مشابه
Machine-Aided Indexing at NASA
This report describes the NASA Lexical Dictionary (NLD), a machine-aided indexing system used online at the National Aeronautics and Space Administration's Center for AeroSpace Information (CASI). This system automatically suggests a set of candidate terms from NASA's controlled vocabulary for any designated natural language text input. The system is comprised of a text processor that is based ...
متن کاملSentence alignment in bilingual corpora based on crosslingual querying
The effectiveness of translation memory for computer-aided translation depends on the results of previous sentence alignment. This paper describes a new approach to sentence alignment, based on a crosslingual querying using the technology of an existing product, SPIRIT (Syntactic and Probabilistic Indexing and Retrieval of Information in Texts). Sentence alignment and crosslingual querying base...
متن کاملNCU in Bilingual Information Retrieval Experiments at NTCIR-6
In this paper, we present the mono-lingual and bilingual ad-hoc information retrieval experimental results at NTCIR-6. This year we compare two different word tokenization levels for indexing, namely, unigram, and overlapping bigram. The two famous information retrieval models, i.e., language model, and BM-25 were adopted in our study. In the mono-lingual results show that our method achieved t...
متن کاملAutomatic Bilingual Lexicon Acquisition Using Random Indexing of Aligned Bilingual Data
This paper presents a very simple and effective approach to automatic bilingual lexicon acquisition. The approach is cooccurrence-based, and uses the Random Indexing vector space methodology applied to aligned bilingual data. The approach is simple, efficient and scalable, and generate promising results when compared to a manually compiled lexicon. The paper also discusses some of the methodolo...
متن کاملBilingual Indexing for Information Retrieval with AUTINDEX
AUTINDEX is a bilingual automatic indexing system for the two languages German and English. It is being developed within the EU-funded BINDEX project. The aim of the system is to automatically index large quantities of abstracts of scientific and technical papers from several areas of engineering. Automatic indexing takes place using a controlled vocabulary provided in monolingual and bilingual...
متن کامل